NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MR.RGM: an R package for fitting Bayesian multivariate bidirectional Mendelian randomization networks

https://doi.org/10.1093/bioinformatics/btaf130

Sarkar, Bitan; Ni, Yang; Cowen, ed., Lenore (March 2025, Bioinformatics)

Abstract MotivationMendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing MR.RGM (Mendelian randomization via reciprocal graphical model), a fast R-package that implements the Bayesian reciprocal graphical model and enables practitioners to construct holistic causal networks with possibly cyclic/reciprocal causation and proper uncertainty quantifications, offering a comprehensive understanding of complex biological systems and their interconnections. ResultsWe developed MR.RGM, an open-source R package that applies bidirectional MR using a network-based strategy, enabling the exploration of causal relationships among multiple variables in complex biological systems. MR.RGM holds the promise of unveiling intricate interactions and advancing our understanding of genetic networks, disease risks, and phenotypic complexities. Availability and implementationMR.RGM is available at CRAN (https://CRAN.R-project.org/package=MR.RGM, DOI: 10.32614/CRAN.package.MR.RGM) and https://github.com/bitansa/MR.RGM.
more » « less
Joint embedding of biological networks for cross-species functional alignment

https://doi.org/10.1093/bioinformatics/btad529

Li, Lechuan; Dannenfelser, Ruth; Zhu, Yu; Hejduk, Nathaniel; Segarra, Santiago; Yao, Vicky; Cowen, ed., Lenore (August 2023, Bioinformatics)

Abstract MotivationModel organisms are widely used to better understand the molecular causes of human disease. While sequence similarity greatly aids this cross-species transfer, sequence similarity does not imply functional similarity, and thus, several current approaches incorporate protein–protein interactions to help map findings between species. Existing transfer methods either formulate the alignment problem as a matching problem which pits network features against known orthology, or more recently, as a joint embedding problem. ResultsWe propose a novel state-of-the-art joint embedding solution: Embeddings to Network Alignment (ETNA). ETNA generates individual network embeddings based on network topological structure and then uses a Natural Language Processing-inspired cross-training approach to align the two embeddings using sequence-based orthologs. The final embedding preserves both within and between species gene functional relationships, and we demonstrate that it captures both pairwise and group functional relevance. In addition, ETNA’s embeddings can be used to transfer genetic interactions across species and identify phenotypic alignments, laying the groundwork for potential opportunities for drug repurposing and translational studies. Availability and implementationhttps://github.com/ylaboratory/ETNA
more » « less
Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer

https://doi.org/10.1093/bioinformatics/btad298

Wu, Tianqi; Guo, Zhiye; Cheng, Jianlin; Cowen, ed., Lenore (May 2023, Bioinformatics)

Abstract MotivationThe state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph. ResultsThe method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score—the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement. Availability and implementationThe source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.
more » « less
MCPNet: a parallel maximum capacity-based genome-scale gene network construction framework

https://doi.org/10.1093/bioinformatics/btad373

Pan, Tony C.; Chockalingam, Sriram P.; Aluru, Maneesha; Aluru, Srinivas; Cowen, ed., Lenore (June 2023, Bioinformatics)

Abstract MotivationGene network reconstruction from gene expression profiles is a compute- and data-intensive problem. Numerous methods based on diverse approaches including mutual information, random forests, Bayesian networks, correlation measures, as well as their transforms and filters such as data processing inequality, have been proposed. However, an effective gene network reconstruction method that performs well in all three aspects of computational efficiency, data size scalability, and output quality remains elusive. Simple techniques such as Pearson correlation are fast to compute but ignore indirect interactions, while more robust methods such as Bayesian networks are prohibitively time consuming to apply to tens of thousands of genes. ResultsWe developed maximum capacity path (MCP) score, a novel maximum-capacity-path-based metric to quantify the relative strengths of direct and indirect gene–gene interactions. We further present MCPNet, an efficient, parallelized gene network reconstruction software based on MCP score, to reverse engineer networks in unsupervised and ensemble manners. Using synthetic and real Saccharomyces cervisiae datasets as well as real Arabidopsis thaliana datasets, we demonstrate that MCPNet produces better quality networks as measured by AUPRC, is significantly faster than all other gene network reconstruction software, and also scales well to tens of thousands of genes and hundreds of CPU cores. Thus, MCPNet represents a new gene network reconstruction tool that simultaneously achieves quality, performance, and scalability requirements. Availability and implementationSource code freely available for download at https://doi.org/10.5281/zenodo.6499747 and https://github.com/AluruLab/MCPNet, implemented in C++ and supported on Linux.
more » « less
3D-equivariant graph neural networks for protein model quality assessment

https://doi.org/10.1093/bioinformatics/btad030

Chen, Chen; Chen, Xiao; Morehead, Alex; Wu, Tianqi; Cheng, Jianlin; Cowen, ed., Lenore (January 2023, Bioinformatics)

Abstract MotivationQuality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. ResultsWe develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. Availability and implementationThe source code is available at https://github.com/BioinfoMachineLearning/EnQA. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
mebipred : identifying metal-binding potential in protein sequence

https://doi.org/10.1093/bioinformatics/btac358

Aptekmann, A. A.; Buongiorno, J.; Giovannelli, D.; Glamoclija, M.; Ferreiro, D. U.; Bromberg, Y.; Cowen, ed., Lenore (May 2022, Bioinformatics)

Abstract Motivationmetal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability. Resultswe developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred’s utility in analyzing microbiome metal requirements. Availability and implementationmebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
RCSB Protein Data Bank 1D3D module: displaying positional features on macromolecular assemblies

https://doi.org/10.1093/bioinformatics/btac317

Segura, Joan; Rose, Yana; Bittrich, Sebastian; Burley, Stephen K.; Duarte, Jose M.; Cowen, ed., Lenore (May 2022, Bioinformatics)

Abstract MotivationMapping positional features from one-dimensional (1D) sequences onto three-dimensional (3D) structures of biological macromolecules is a powerful tool to show geometric patterns of biochemical annotations and provide a better understanding of the mechanisms underpinning protein and nucleic acid function at the atomic level. ResultsWe present a new library designed to display fully customizable interactive views between 1D positional features of protein and/or nucleic acid sequences and their 3D structures as isolated chains or components of macromolecular assemblies. Availability and implementationhttps://github.com/rcsb/rcsb-saguaro-3d. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
pystablemotifs: Python library for attractor identification and control in Boolean networks

https://doi.org/10.1093/bioinformatics/btab825

Rozum, Jordan C.; Deritei, Dávid; Park, Kyu Hyong; Gómez Tejeda Zañudo, Jorge; Albert, Réka; Cowen, ed., Lenore (December 2021, Bioinformatics)

Abstract Summarypystablemotifs is a Python 3 library for analyzing Boolean networks. Its non-heuristic and exhaustive attractor identification algorithm was previously presented in Rozum et al. (2021). Here, we illustrate its performance improvements over similar methods and discuss how it uses outputs of the attractor identification process to drive a system to one of its attractors from any initial state. We implement six attractor control algorithms, five of which are new in this work. By design, these algorithms can return different control strategies, allowing for synergistic use. We also give a brief overview of the other tools implemented in pystablemotifs. Availability and implementationThe source code is on GitHub at https://github.com/jcrozum/pystablemotifs/. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less

Search for: All records